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ABSTRACT 

Research on the major computerized adaptive testing 
(cat) strategies is reviewed, and some findings are reported that 
examine effects of examinee demographic and psychological 
characteristics on CAT strategies. In fixed branching strategies, all 
examinees respond to a common routing test, the score of which is 
used to assign examinees to a second-stage test. The currently 
popular statistically branched adaptive strategies are based on 
item-response theory, and include maximum likelihood strategy and 
Bayesian strategy. Two alternative strategies are the use of 
self-adapted testing and testlet strategies. Examinee characteristic 
variables are divided into: (l) demographic variables; (2) 
computer-use variables; (3) test-taking strategy variables; (4) 
cognitive characteristics; and (5) affective characteristics. 
Although research on the relationship between examinee psychological 
characteristics and CAT has been inconclusive, the basic findings are 
that examinees of different ethnic, gender, age, grade, ability, 
academic self-concept, test anxiety, computer anxiety, math anxiety, 
and computer experience groups are differentially affected by the 
adaptive testing strategies. Implications for research and practice 
are discussed, (Contains 67 references,) (SLD) 
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INDIVIDUAL DIFFERENCES IN COMPUTERIZED ADAPTIVE TESTING 



Computerized adaptive testing (CAT) is an efficient and viable alternative to paper-and- 
pencil testing. Recent item response theory research and advances in microcomputer technology 
have indicated CAT can adapt during the test administration according to student performance on 
each test item. In a CAT, test items are selected according to an algorithm that attempts to 
maximize the efficiency of a test by providing the maximum amount of information about an 
examinee s ability with the minimum number of items. 

There are some areas that require further investigation in computerized adaptive testing. 
Current popular research issues are the reliability and validity of CATs, the ordering of items, the 
ability estimation procedures, and the context of item administration in the estimation of 
parameters (Wise & Plake, 1989). Research has paid much attention to the efficiency and 
precision of the examinee's ability estimation. However, the research on the effects of examinees' 
demographic and psychological characteristics on CAT has been largely neglected. In other 
words, how individual differences among examinees are systematically related to the adaptive 
testing strategies have not been extensively studied. In fact, there have been more studies on 
examinees' individual differences in conventional tests or computerized tests (CT) than in 
computerized adaptive tests. Recently, a few researchers have paid attention to the individual 
differences in CATs (Buhr & Legg, 1989; Legg & Buhr, 1992; Vispoel & Rocklin, 1993). 
Therefore, a comprehensive review on individual differences is needed to suggest some guidelines 
for further investigations of CATs. The purpose of this paper is to critically review the research 
on the major computerized adaptive testing strategies and to report the findings of some studies 
that examined the effects of examinee demographic and psychological characteristics on the 
computerized adaptive testing strategies. 
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Computerized Adaptive Testing Strategies 

Adaptive testing is an alternative procedure which matches different sets of test items to 
different examinees' previous responses to items or abilities during the administration of a test. 
Generally, an operational adaptive test requires four components: item pool, item selection 
procedure, scoring procedure, and stopping rules. There are different approaches to selecting 
items from a pool, beginning points, scoring for individuals, termination for different individuals 
(Assessment Systems Corporation, 1989; Kingsbury & Zara, 1989; Reckase, 1989; Powell, 1991; 
Weiss, 1985). In this paper, the following three adaptive testing strategies are discussed; fixed- 
branching strategies, statistically branched strategies, alternative strategies. 

Fixed-Branching Adaptive Strategies 

The typical examples of the fixed-branching adaptive strategies are the two-stage tests, 
pyramidal test, and the stradaptive tcot. The two-stage tests (Angoff & Huddleston, 1958; Betz & 
Weiss, 1973, 1974; Kim & Plake, 1993; Lord, 1980; Weiss, 1974) are composed of a routing test 
and a measurement test. All examinees first respond to a common routing test. The routing test 
is typically a test of average difficulty. The score on that test is then used to assign each 
examinee to a second-stage measurement test. Responses to both tests are used to arrive at a 
final score. In a two-stage adaptive test, individuals who answer all or most of the items correctly 
on the short routing test receive a measurement test of higher difficulty; individuals who answer 
about half of the items correctly on the routing test receive a second-stage measurement test of 
average difficulty, and individuals who answer only a few items correctly on the routing test 
receive a second-stage measurement test of low difficulty. 

The most popular example of multistage tests is the pyramidal test (Bayoff, Thomas, & 
Anderson, 1960; Larkin & Weiss, 1974; Lord, 1970; Weiss, 1974). The pyramidal test consists of a 
set of items prestructured by difficulty into a structure resembling a pyramid. At the top of the 
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pyramid is an item of average difficulty. At the next stage of the test are two items, one of which 
is slightly more difficult than the first item, with the other item slightly less difficult than that item. 
The branching rule used in a pyramidal adaptive test is that the slightly more difficult item is 
administered following a correct response to an item, and the slightly easier item is administered 
following an incorrect response to an item. 

The best example of this type of strategy is the stradaptive test (Vale & Weiss, 1975a, 
1975b, 1978; Weiss, 1973) in which several subtests are defined, each containing items at a 
specified difficulty level. In a stradaptive test, testing proceeds by administering an item in one 
stratum and then branching to a more difficult stratum if the item is answered correctly or to a 
less difficult stratum if it is answered incorrectly. Whenever an examinee branches to a stratum, 
the next previously unadministered item in that stratum is administered to the examinee. 

The fixed-branching adaptive strategies are useful if adaptive tests are to be administered 
by paper-and-pencil or by simple testing machines. However, these approaches to adaptive testing 
have several limitations (Weiss, 1985). A primary limitation is that they generally use only item 
difficulty information in order to structure the item pool. The second problem is how to develop 
scoring methods that can be used when different items are answered by different examinees. The 
third problem is that the strategies are designed for fixed-length test administration with the 
exception of the stradaptive test. 

Statistically-Branched Adaptive Strategies 

The currently popular approaches for adaptive testing were developed during the late 
1960s and early 1970s based on item response theory (IRT) methodology. IRT is a statistical 
theory consisting of a family of models that express the probability of observing a particular 
response to an item as a function of certain characteristics of the item and of the ability level of 
the examinee (Crocker & Algina, 1986; Hambleton & Swaminathan, 1985). The typical IRT- 




based strategies are maximum likelihood strategy and Bayesian strategy (Weiss & Kingsbury, 
1984). 

Maximum Likelihood Strategy : The likelihood function for a set of test items indicates the 
probability of observing the entire vector of obtained item responses at each level of ability. 
From this likelihood function, an estimate of the examinee's ability can be obtained. 
Conceptually, this can be done by assuming that the best estimate of an examinee s ability is the 
level of ability that would most likely produce the vector of responses observed. This is 
determined by locating the maximum value of the likelihood function and identifying the ability 
level (theta) associated wth that mfocimum. This score is called the maximum likelihood estimate 
of ability. The maximum information adaptive testing strategy (Weiss, 1982, 1985) selects items 
that provide maximum levels of iten: information at an individual's currently estimated trait level. 
After the administration of an item and estimation of trait level, the new trait level is used to 
select the next item to be administered to that examinee. In a maximum information adaptive 
test, a sequential process is specified in which an item is administered, an ability estimate is 
calculated, the item providing the most information at that estimate is selected, and the process is 
repeated. This process will continue until a fixed number of items has been administered or until 
some other criterion for termination has been satisfied. 

Bayesian Strategy : A Bayesian estimate (e.g., Owen, 1969, 1975) is conceptually very similar to 
the maximum likelihood estimate. Bay i strategies use Bayes' theorem to estimate an 
examinee s ability. Bayes theorem generates a posterior probability distribution from the 
combination of a prior probability distribution and the current observation. The Bayesian 
posterior likelihood function can become a legitimate probability density function with a mean 
and variance. The mean or the mode of the posterior can be taken as an estimate of ability. In 
Bayesian strategies, items are selected on the basis of minimizing the Bayesian posterior variance 



5 

of the ability estimate rather than maximizing values of item information (Owen, 1969, 1975). 
Owen's item selection strategy utilizes a current ability estimate and its Bayesian variance as the 
prior distribution for the item selection process. In order to select the next item to be 
administered during the adaptive test, this method evaluates the posterior variance of the ability 
estimate for each item in the pool under two conditions: a) if the item is answered correctly and 
b) if the item is answered incorrectly (Weiss, 1982). 

According to Weiss and Kingsbury (1984), two efficient item selection procedures are 
maximum information and Bayesian. Both procedures involve searching the entire pool of 
unadministered items for a single item. Because of the relationships between information and 
Bayesian posterior variance, maximum information strategy and Bayesian strategy will frequently 
select a similar subset of items in many cases (Sympson, Weiss, & Ree, 1982), In obtaining ability 
estimates, maximum likelihood estimation poses problems when the number of test items is small. 
Bayesian procedures overcome the problems encountered with maximum likelihood procedures 
but may produce biased estimates of ability if inappropriate prior distributions are chosen 
(Hambleton, Swaminathan, & Rogers, 1991), 

Alternative Adaptive Strategies 

In this section, two alternatives to the traditional adaptive strategies that merit discussion 
are the use of self-adapted testing and testlet strategy (Kingsbury & Zara, 1989; Plake, 1993). 
Testlet Strategy : The concept of the testlet was introduced explicitly by Wainer and Kiely (1987) 
as content-based item clusters which are analyzed as units and are independent of all other 
testlets and items. Thus, the testlet was proposed as the unit of construction and analysis for 
CATs. It could ease some of the observed and prospective difficulties associated with most 
current methods of test construction such as context effects, item ordering, and content balancing 
(Wainer, et al., 1990). 
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The procedures of the testlet strategy are as follows. First, specify an initial esdmate of 
proficiency (this specifies an initial testlet). Second, estimate proficiency after each testlet. 
Choose the remaining testlet that is most informative near the estimated proficiency to be 
administered next. Finally, stop when the precision of the estimated proficiency is adequate, or 
when some pre-specified number of testlets have been administered. 

Testlets have been applied to scaling of reading comprehension items (Thissen et al., 
1989), to the measurement of algebra items (Wainer & Lewis, 1990; Wainer, Lewis, Kaplan, & 
Braswell, 1991), and to scaling performance assessment tasks (Yen, 1992). Although applications 
are possible, it is unlikely that a testlet-based adaptive test could be used in most ongoing testing 
situations (Kingsbury & Zara, 1989). 

Self-Adapted Testing : Rocklin and O'Donnell (1987) developed an alternative procedure, called 
self-adapted testing, in which examinee; could choose the difficulty of the items they attempt on 
an item-by-item basis. This strategy seete to minimize student aaxiety and maximize student 
performance by allowing the examinee to choose items, rather than by a computer algorithm 
(Rocklin & O'Donnell, 1991; Wise, Plake, Johnson, & Roos, 1992; Wise, Roos, Plake, & 
Nebelsick-Gullett, 1993). 

In self-adapted testing, the student takes a test question, is informed whether his or her 
answer was correct, and then decides how difficult the next item on test should be. To facilitate 
student decisions concerning item difficulty, the items are prestructured into difficulty groups or 
strata, as in the stradaptive test. The major difference between the stradaptive test and self- 
adapted test is that the examinee chooses the difficulty stratum, rather than having the computer 
choose it (Kingsbury & Zara, 1989). 

Some research findings have shown that examinees taking the self-adapted test scored 
significantly higher than those taking the computerized adaptive test (Rocklin & O'Donnell, 1987; 



Vispoel & Rocklin, 1993; Wise, Plake, Johnson, & Rocs, 1992). Although this strategy might 
reduce test anxiety, it would produce a test score with a very low information value because the 
student has option to take a test is far below (or above) his or her optimal performance level 
(Kingsbury & Zara, 1989). 

Examinee Characteristics Variables 

There is some concern that computerized tests, including CAT, may produce differential 
effects for different groups of students (Legg & Buhr, 1992). There are several examinee 
demographic and psychological characteristics that can contribute to computerized tests. Parshall 
and Kromrey (1993) have divided examinee characteristics into the following three sub-categories; 
(a) demographic variables (gender, racial/ethnic background, and age), (b) computer use variables 
(variety of computer experience, frequency of computer use, frequency of mouse use, and test 
mode preference), and (c) test taking strategy variables (test strategy preference, tendency to omit 
items, and tendency to review item). They did not include examinee cognitive and affective 
characteristics in their classification. 

In this paper, examinee characteristics are divided into five sub-categories by adding these 
two characteristics. Each specific variables indicating individual difference can be found in typical 
computerized testing studies. First, demographic variables include gender, race or ethnic 
background, grade, and age (Johnson & Mihal, 1973; Johnson & White, 1980; Sorensen, 19B5; 
Llabre & Froman, 1987; Moe & Johnson, 1988; Parshall & Kromrey, 1993). Second, cognitive 
characteristics variables belong to ability, aptitude, and achievement (Lee, Moreno, & Sympson, 
1986; Wise & Wise, 1987). Third, affective characteristics variables contain anxiety, test anxiety, 
computer anxiety, math anxiety, self-concept, and attitudes (Wise, Plake, Eastman, Boettcher, & 
Lukin, 1986; Moe & Johnson, 1988; Wise, Barnes, Harvey, & Plake, 1989). Fourth, computer use 
variables include computer experience, computer use, and mouse use (Lee, 1986; Wise, Barnes, 



Harvey, & Plake, 1989; Dimock & Cormier, 1991; Parshall & Kromrey, 1993). Finally, the 
examples of test taking strategy variables are test strategy preference, tendency to omit items, 
tendency to review items, response time, and testwiseness (Rocklin & O'DonnelJ, 1987; Spray, 
Ackemian, Reckase, & Carlson, 1989; Ward, Hooper, & Hannafin, 1989; Wise & Plake, 1989; 
Green, 1991). 

However, a few studies (Rocklin & O'Donnell, 1987; Legg & Buhr, 1992) have begun to 
investigate the relationship between examinee characteristics and CATs. It is assumed that two 
approaches search for individual differences in terms of the testing situation. The first approach 
is trying to investigate examinee differences in the typical computerized adaptive testing situation. 
The second approach explores individual differences in the self-adapted testing, each examinee is 
allowed to choose the level of difficulty of the next item to be presented from among several 
levels of difficulty. In this paper, examinees' individual differences will be described by the above 
mentioned variables both in the CAT and in the self-adapted testing. 

Demographic Variables 

The examinee's demographic variables include gender, age, ethnic background, and grade. 
Olsen, Maynes, Slawson, and Ho (1989) compared the effectiveness of paper-administrated, 
computer-administrated, and computerized adaptive achievement tests for grades three and six. 
This was a pioneer study to evaluate computerized adaptive testing at the elementary grade school 
level. The investigators found no significant differences between paper-administered tests and 
computer-administered tests for grades three and six. 

Legg and Buhr (1992) looked at how administration of the examination by computer 
affected examinees and, in particular, whether examinees of different ethnic, gender, age, ability, 
and computer-experience groups were differentially affected. They also explored whethei group 
differences in reactions to the computerized test administration could help to explain observed 



differences between CAT and conventional test scores and differences in the time used for 
testing. They developed three adaptive tests which consisted of mathematics, reading, and writing, 
using the MicroCAT (Assessment System Corporation, 1989) software program. The study used 
the questionnaire which consisted of 19 Likert-type items and 4 open-ended questions. The 
investigators found that little difference was observed between examinees less than 30 years of 
age and those 30 years old and older in their response to the questionnaire. They also found that 
mean scores for males and females differed in two areas. First, male students responded 
significantly less favorably than females. Second, males were more critical than females of the 
graphics for the mathematics items. They showed that mean scores for White, Black, Hispanic 
and Asian ethnic groups differed for several questions. First, Hispanic and Asian students were 
much less likely to indicate that they had had enough practice in responding than White students. 
Second, Asian students reported greater eye strain at the end of the test than other ethnic groups. 
Third, Asian students also differed from other students in their preference for the computer test 
and preferred the regular test, in contrast to the other groups. 

Cognitive Characteristics Variables 

Cognitive characteristic variables include ability, aptitude, and achievement. In general the 
related literatuu is fairly sparse because examinees' abilities, aptitudes, and achievements are 
largely used as dependent variables in the CAT research. Schinoff and Steed (1988) investigated 
that lower proficiency examinees liked a CAT better because they did not feel the discouragement 
associated with facing a long string of items that were too hard for them. 

Legg and Buhr (1992) divided examinee into three ability groups based on conventional 
reading test sc )res and defined "low ability" as more than one standard deviation below the mean, 
"high ability" as more than one standard deviation above the mean, and "average" as within one 
standard deviation of the mean. They found that higher-ability examinees were more bothered 
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than lower-ability examinees by not being able to review items after cx)mpleting them. 

Affective Characteristics Variables 

Examinees' affective characteristics which have been investigated in CAT research are test 
anxiety (Powell, 1991; Vispoel & Rocklin, 1993), computer anxiety (Legg & Buhr, 1992; Vispoel 
& Rocldin, 1993), math anxiety (Wise, Roos, Plake, & Nebelsick-GuUett, 1993), self-concept 
(Vispoel & Rocklin, 1993), and attitudes toward computerized adaptive tests (Schmidt, Urry, & 
Gugel, 1978; Moe & Johnson, 1988; Vispoel & Rocklin, 1993). Schmidt, Urry, and Gugel (1978) 
investigated examinee reactions to computer assisted tailored testing. They reported the reactions 
and opinions of 163 examinees who participated in a tailored pilot study conducted at the U.S. 
Civil Service Commission during the fall of 1975. They showed that the reactions of the 
examinees were positive. 

Moe and Johnson (1988) studied participants' reactions to computerized adaptive ability 
test and assessed the practicability of this testing method in the classroom. Three hundred, fifteen 
students took a computerized and printed version of a standardized aptitude test battery, and a 
survey assessing their reactions. They found that overall reactions to the computerized test were 
overwhelmingly positive. 

Powell (1991) examined the relationship between test anxiety and test performance in the 
three computerized adaptive testing procedures. She found no statistically significant differences 
among mean student achievement scores, nor among in-test anxiety means under the three 
adaptive testing methods. The study also showed that students who reported higher pre-test 
anxiety scored significantly higher in the matched-selection test, and students who preferred the 
matched-selection and self-selection tests were significantly less anxious during those tests. 

Legg and Buhr (1992) found a significant interaction between ethnic and gender groups 
on computer anxiety. In other words, differences in reported computer anxiety for the ethnic 
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groups were much larger for females than males, with Black females reporting the greatest anxiety 
and White females the least. For males, the mean for Hispanics was highest, while the mean for 
Asian examinees was lowest. In contrast, Hispanic females reported low computer anxiety. The 
mean indica' jd computer anxiety did not differ greatly for White and Black males. Contrary to 
expectations, females as a group reported less computer anxiety than males. 

Vispoel and Rocklin (1993) assessed the effects of several individual difference variables 
(test anxiety, verbal self-concept, computer usage, and computer anxiety) on ability estimates 
alone and in interaction with the test administration procedures (maximum information adaptive, 
self-adapted, and fixed-item computerized vocabulary tests). The study used the same large, well- 
calibrated item bank for all the tests. They found significant main effects for test anxiety and self- 
concept variables and a significant self-concept by testing condition interaction. The most striking 
differences among administration conditions occurred for individuals with low verbal self-concepts, 
who performed noticeably better on the self-adapted test than on the other tests. As expected, 
estimated ability scores across the total sample were significantly higher for individuals with higher 
verbal self-concept and lower test anxiety. 

Wise, Roos, Plake, and Nebelsick-Gullett (1993) investigated the relative influences of test 
type and test choice on examinee anxiety. They found that examinees low in math anxiety showed 
a strong preference for CAT, while the highly math amdous examinees chose self-adapted tests. 

Computer Use Variables 

Several studies have investigated the relationship between examinee performance on 
computerized tests and computer use variables (Lee, 1986; Wise, Barnes, Harvey, & Plake, 1989; 
Dimock & Cormier, 1991). Legg and Buhr (1992) also investigated computer experience by the 
self-report methods in the CAT testing situation. They found that computer experience did not 
differ significantly for gender or age groups, while differences were reported for ethnic and ability 
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Vispoel and Rocklin (1993) also assessed computer usage by participants' estimates of 
average number of hours a week they spend working on computers. They transformed the scores 
for computer usage responses by adding one to each score and taking the base 10 logarithm of 
the sum because their distribution was positively skewed. The study found no significant main 
effect for computer usage variable. 

Test Taking Strategy Variables 

The examinees' test taking variables include test taking strategy (Rocklin & O'Donnell, 
1987; Wise, Plake, Johnson, & Roos, 1992; Wise, Roos, Plake, & Nebelsick-Gullett, 1993), 
response time (Gershon, Bergstrom, & Lunz, 1993), and item review (Lunz, Bergstrom, & Wright, 
1992; Lunz & Stahl, 1993), Rocklin and O'Donnell (1987) explored a variant application of IRT 
in computerized testing, termed self-adapted testing, in which the difficulty levels of the items 
administered are chosen by the examinee, rather than by a computer algorithm. They found that 
examinees who received a self-adapted test scored significantly higher than examinees receiving a 
conventional computerized test. 

Subsequent research studies (Rocklin & O'Donnell, 1991; Wise, Plake, Johnson, & Roos, 
1992; Roos, Plake, & Wise, 1992) have explicitly compared self-adapted test and CAT. The 
studies described above indicated that a self-adapted test had yielded higher mean examinee test 
performance than a CAT, and had been accompanied by lower mean post-test state anxiety. 

Gershon, Bergstrom, and Lunz (1993) analyzed the total response time for each item in 
CAT. They also divided total response time into initial test time and review time. They found 
that response time increased proportionately with increasing item text length and increasing item 
difficulty. The study showed that item sequence also was an important factor in that response 
time was greater for earlier items in the test. 



FINDINGS AND IMPLICATIONS 

Although the previous research on the relationship between examinee psychological 
characteristics and CAT has been inconclusive, the basic findings are that examinees of different 
ethnic, gender, age, grade, ability, academic self-concept, test anxiety, computer anxiety, math 
anxiety, and computer experience groups were differentially affected by the adaptive testing 
strategies in the related studies. In this section, the findings will be discussed in terms of five sub- 
categories. 

First, research on the relationship between various demographic variables and CAT has 
not been conclusive. Gender, ethnic background, age, and grade are among the demographic 
variables which have been investigated. Some differences were observed between ethnic, gender, 
and age groups in their reactions to CAT, but these differences did not appear to affect the 
examinees' performance on the test. Although decisive evidence of the relationship between 
these variables and CAT has not been obtained, grounds for concern can be found in the results 
of national surveys on the equity of computer access (Becker & Sterling, 1987; Martinez & Mead, 
1988). The lower access to the computers could cause an impact on the performance in CAT. 

Second, cognitive characteristics variables have been largely used as dependent variables in 
the CAT research. One of the findings on these variables is that lower ability examinees did not 
indicate any greater problems in CAT than higher ability examinees. Based on the studies which 
found the difference of ability groups in their reactions to CAT, we can say that lower ability 
examinees have positive attitudes toward CAT. 

Third, affective characteristics variables have been used as both dependent variables and 
independent variables in the CAT research. The findings were that (a) examinee attitude toward 
CAT was generally very positive, (b) computer anxiety in the CAT testing situation was related to 
examinees' ethnic and gender group, (c) test anxiety and verbal self-concept were significantly 



related to ability estimates, with higher scores obtained by individuals with higher verbal self- 
concepts and lower test anxiety, and (d) a strong relationship was found between examinee test 
type choice and math anxiety level. 

Fourth, some aspects of computer use variables have been investigated as potential 
sources of CAT, but the clear-cut results have not been obtained. A pattern of lower scores for 
examinees with less computer experience is frequently seen, although the score differences are 
often not statistically significant (Wise, Barnes, Harvey, & Plake, 1989). In other words, those 
students with less computer experience did not feel that they had enough practice responding, in 
comparison to the other two groups (Legg & Buhr, 1992). 

Finally, test-taking strategy variables have been investigated as potential sources of CAT. 
The findings were that a self-adapted test had yielded higher mean scores than a CAT, and had 
been accompanied by lower mean post-test state anxiety. Specifically, the interaction of 
examinees' test-taking strategies with test flexibility appears to be important. 

The research findings on examinee characteristics related to computerized adaptive testing 
strategies have some implications for test specialists and researchers. Although examinee 
characteristics may partially account for test performance in the CAT, some researchers also think 
that examinee characteristics play an important role in exploring the practicability of CAT in the 
near future. 

First, the research findings have implications for equity issues in testing. The most 
extensive research has been conducted on the equivalence between computer-based tests and 
their conventional test counterparts. There is a paucity of research on the equivalence between 
CAT and CT or conventional test counterparts. These research may identify individual difference 
variables that influence the equivalence among the three modes of testing. 

Second, the research on examinee characteristics provides the empirical evidences of test 



validity. Validity refers to the extent to which an interference made from test scores is 
appropriate or meaningful (AERA/APA/NCME, 1985). Only by examining the relationships 
between scores on a test and the other variables specified in the theory can the validity of test 
administration procedures be compared (Vispoel & Rocklin, 1993). The relationship between 
examinee characteristics and CAT strategies orovides some minimal information about the 
construct-related validity of the test administration procedures. 

Third, the findings suggest that more practice items add to the actual tests for some 
subgroups of examinees. Adding one or two practice items would give examinees more 
opportunity to master the scrolling and might alleviate this problem. 

Finally, cultural differences in CAT should be explored through the cross-cultural studies. 
There are no cross-cultural studies on examinees' performance and attitudes toward CAT. These 
studies will enable us to glimpse some possibilities that the CAT can be widly used from all over 
the world. This issue should be paid "^rimary attention to future investigations of computerized 
adaptive testing. 
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